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Abstract 

We investigate the downlink multi-user MIMO (MU-MIMO) scheduling problem in the presence of 
imperfect Channel State Information at the transmitter (CSIT) that comprises of coarse and current CSIT 
as well as finer but delayed CSIT. This scheduling problem is characterized by an intricate 'exploitation 
- exploration tradeoff between scheduling the users based on current CSIT for immediate gains, and 
scheduling them to obtain finer albeit delayed CSIT and potentially larger future gains. We solve this 
scheduling problem by formulating a frame based joint scheduling and feedback approach, where in each 
frame a policy is obtained as the solution to a Markov Decision Process. We prove that our proposed 
approach can be made arbitrarily close to the optimal and then demonstrate its significant gains over 
conventional MU-MIMO scheduling. 
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I. Introduction 

Multiple Input Multiple Output (MIMO) technology is essential for the emerging 4G-LTE wireless 
communication systems. In the downlink of such a system, which typically has several active users, 
multiple antennas enable simultaneous transmissions to multiple users by allowing the transmitter (base- 
station) to transmit (along directions in a signal space) in a manner which ensures that each user can 
receive its intended signal along at-least one interference-free dimension (a.k.a. the Multi-user MIMO 
principle) [Tj. The number of active users is generally greater than the maximum supportable number of 
simultaneous transmissions, which in turn is equal to the number of transmit antennas at the base-station 
(BS). Consequently, only a subset of users can be selected for the MU-MIMO transmission and hence 
proper user scheduling is important to achieve a desired network utility (e.g., throughput, fairness). 

The usual assumption made in existing literature on MU-MIMO scheduling is that the BS can obtain 
the channel state information from all users with sufficient accuracy and with negligible delay. Such 
information, referred to as the Channel State Information at the Transmitter (CSIT), is crucial to ensure 
that each scheduled user is not dominated by co-channel interference. Typically, the BS obtains CSIT 
by broadcasting a sequence of pilot symbols, and the users in turn estimate their CSI and feedback their 
quantized estimates to the BS. This feedback process introduces two sources of imperfections to the CSIT. 
(1) Estimation and quantization errors (due to limited training and finite codebooks); (2) Delays (due to 
user processing speeds and less flexible scheduling on the feedback channel). The impact of erroneous 
CSIT on MU-MIMO performance has been analyzed in Q and utility maximization for MU-MIMO 
with erroneous CSIT has been considered in |@J. Delay in the CSIT has hitherto been addressed by 
using prediction based approaches but their drawback is that they have to assume a model for channel 
evolution, which is significantly difficult to obtain in practice and they also require the delay to be small 
enough to allow for useful prediction. 

For the scenario where the number of users is small enough so that user scheduling is unnecessary, 
referred to here as the static scenario, Maddah-Ali and Tse proposed a scheme, namely the MAT scheme 
151 , that utilizes CSIT that is error-free albeit completely outdated. Their seminal work revealed that the 
outdated CSI is an important resource that, when combined with the eavesdropped information at the 
users, can provide a considerable performance gain in terms of degrees of freedom. Recently, the MAT 
scheme was extended (for the static scenario) to the hybrid CSIT case by also incorporating coarse and 
current CSIT [6] to obtain further system gains. However, in the ubiquitous setting where user scheduling 
is important, such hybrid CSIT needs to be exploited wisely since it is costly to obtain even delayed 
but error-free CSI feedback from all users for making the scheduling decisions. Indeed, the problem is 
quite different and more challenging than the static case. User scheduling for the MAT scheme has been 
considered in J3] but their suggested method is akin to the myopic approach discussed later in this paper. 

In this paper, we study MU-MIMO downlink scheduling with hybrid CSIT, erroneous as well as 
delayed, where the time axis is divided into separate scheduling intervals. We consider the realistic 
scenario where current and coarse CSIT is obtained from all users while more accurate (not necessarily 



perfect) but delayed CSIT is obtained only from the scheduled users. The scheduling problem is hence 
characterized by an intricate 'exploitation - exploration tradeoff, between scheduling the users based on 
current CSIT for immediate gains, and scheduling them to obtain finer albeit delayed CSIT and potentially 
larger future gains. The contributions of the paper are listed as follows. 

• We tackle the aforementioned 'exploitation - exploration tradeoff by formulating a frame based 
joint scheduling and feedback approach, where in each frame a policy is obtained as the solution to 
a Markov Decision Process (MDP), the latter solution being determined via a state-action frequency 
approach ifTOTHfTTI . 

• We consider a general utility function and associate a virtual queue with each user that guides 
the achieved utility for that user. Based on MDP solutions and virtual queue evolutions, we show that 
our proposed frame-based joint scheduling and feedback approach can be made arbitrarily close to the 
optimal. 

In the following we use (.) T , (.)' for the transpose and conjugate transpose, respectively. Moreover, 
[A, B] and [A; B] are used to denote column-wise and row-wise concatenation of matrices A and B, 
respectively. ||A|| is used to denote the Frobenius norm of the matrix A. 

II. System Model and Problem Formulation 

We consider the downlink MU-MIMO scheduling problem with one Base Station (BS) and N users. The 
BS is equipped with M t transmit antennas and employs linear transmit precoding. Each user is equipped 
with a single receive antenna. Time is divided into intervals and we let hi[k] € (D lxMt , i = 1,- ■ ■ ,N 
denote the channel state vector seen by user % in interval k. In each interval, a subset of users can be 
simultaneously scheduled. Further, since each user has only one receive antenna, it can achieve at-most 
one degree of freedom (i.e., its average data rate per channel use can scale with SNR as log(SNR)). On 
the other hand, the system can achieve at-most M t degrees of freedom in that the total average system 
rate can scale with SNR as Af t log(SNR). For notational convenience we assume that in each interval 
two users can be simultaneously served, hence limiting the achievable system degrees of freedom to 2. 
All results can however be extended to the general case without this restriction. 

A. Conventional MU-MIMO scheme 

Conventional MU-MIMO scheme relies on estimates of the user channel states (that are available at 
the BS) for the current interval. Indeed, perfect CSIT for the current interval enables the BS to transmit 
simultaneously to both scheduled users without causing interference at either of them. However, in the 
absence of perfect CSIT such complete interference suppression via transmitter side processing is no 
longer possible and when only very coarse estimates for the current interval are available, conventional 
MU-MIMO breaks down and in-fact becomes inferior to simple single-user per interval transmission. 
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Fig. 1. Illustration of the scheduling process. 

B. Joint Scheduling and Channel Feedback 

We consider a joint scheduling and channel feedback scheme that builds upon a variant of the extended 
MAT technique 0. The extended MAT scheme is recapitulated in Appendix [A] Specifically, we assume 
that coarse quantized channel state estimates from all users for the current interval are available to the 
BS, along with limited finer albeit outdated quantized channel state estimates. In this context we note that 
in the FDD downlink only quantized estimates are available to the BS and henceforth unless otherwise 
mentioned, we will use "estimates" to mean "quantized estimates". The time duration of interest is divided 
into intervals with each interval comprising of 3 slots each. The three slots are mutually orthogonal time- 
bandwidth slices. For convenience, we assume that all three slots in an interval are within the coherence 
time and coherence bandwidth window so that the channel seen by each user remains constant over the 
three slots in an interval. At the beginning of the k th interval, whose corresponding slots are denoted by 
[k, 1], [k,2] and [k, 3], the scheduler broadcasts a short sequence of pilot symbols to all the users. This 
sequence enables a coarse estimation of the wireless channel at each of the N users, which is fed back 
to the BS after quantization and is denoted by H[k] = {hi[k],i = 1,---N}, where hi[k] denotes the 
coarse channel estimate obtained from user % for interval k. Based on these coarse estimates, along with 
its past scheduling and channel state history (formally introduced next), the scheduler chooses a pair of 
users to schedule in the current interval, where in the first slot a linear combination of new packets is sent 
for the selected user pair. Data transmission to the selected user pair in the current interval also contains 
additional pilots that enable a finer estimation of the channel states seen by that user pair over the current 
interval. Note that such finer estimation is crucial for data detection. However, due to user processing 
and feedback delays, we assume that (quantized versions of) such finer estimates are not available to the 
BS during the current interval itself. Because of this constraint, instead of performing the transmissions 
in slots 2 and 3 for interference resolution for the packets sent in Slot 1 of the current interval, as would 
be done in the extended MAT scheme f6j], the BS performs transmissions for interference resolution for 
packets sent in Slot 1 of the prior most recent interval when the selected user pair was scheduled. The 
scheduling model is illustrated in Fig. Q] 

As mentioned above the scheduler obtains a finer estimate of the channel states seen by a user pair 
on the interval in which they are scheduled, at the end of that intervalQLet 9 = (u\ ,U2,k) represent 

'Arbitrary delays in obtaining such finer estimates are also considered later in the paper. 



the 3— tuple denoting the scheduling decision made for the current interval k such that U\,U2 denote the 
selected user pair and k denotes the index of the prior most recent interval over which that pair was 
scheduled. We let T[k] be the collection of the most recently obtained finer channel estimates at the BS 
for each of the user pairs and their corresponding interval indices, at the start of interval k. Thus, the set 
F[k] takes the form F[k] = {(hi[Ki j], hj\Ki j], Kij), 1 <i<j< -/V}, where hj[Kij]) denote 

the finer estimates for interval mj and Kij denotes the index of the prior recent-most interval on which 
pair i,j was scheduled. At the end of that interval (equivalently at the start of interval k + 1) the set 
F[k + 1] is obtained by first setting it equal to F[k] and then updating the 3— tuple corresponding to the 
pair («i,«2) selected in interval k to (h Ul [k], h U2 [k], k). 

The set of user channel states are assumed to be i.i.d. across intervals and the channel states of any 
two distinct users are assumed to be mutually independent. Given a particular initial rough estimates of 
the channel states of the user pair selected in interval k, (h Ul [k],h U2 [k]), the distribution of the finer 
channel estimates in the same interval is described by the conditional distribution 

P(h u Ak],h U2 [k]\h Ul [k},hu 2 [k}) (1) 

where the conditional probability depends on the types of channel estimators, quantization, training 
times and powers, etc. We let C coarse (Cfi ne ) denote the finite sets or codebooks of vectors from which 
all coarse (fine) estimates are selected. Let |C coa rse| and |Cfi nc | denote their respective cardinalities and 
clearly |C finc | > \C coarse | ■ 

C. Expected Transmission Rates (Rewards) 

During the current interval k, formed by slots [k, 1], [k, 2] & [k, 3], once a pair of users is selected, the 
scheduler specifies transmit precoding matrices or vectors for each slot in the interval. 

1) Slot 1: For slot 1, the overall transmit precoding matrix is denoted by the matrix [W Ul [k], W U2 [k]], 
where W Ul [k] , W U2 [k] € (D M ' x2 . Let x Ul [k] = W Ul [k]s Ul [k], x U2 [k] = W U2 [k]s U2 [k], where 
s U2 [k] , s U2 [k] denote the 2x1 symbol vectors containing symbols formed using the new packets intended 
for user u\ and U2, respectively, and where E[s Ui [kjs^ [k]] = I, i € {1,2}. Then, the signal transmitted 
in slot-1 is :r Ul [fc] + a; U2 [A;] so that the received signals at both users are 

y Ul [k, 1] = h Ul [k] (x Ul [k] + x U2 [k] ) + n Ul [k,l], (2) 
y U2 [k,l] =h U2 [k](x Ul [k]+x U2 [k]) + n U2 [k,l]. (3) 

Note that the allocated transmission power for scheduled user ui is the norm || W Ui [k] \\ 2 . We assume 
that the maximum average (per-slot) transmission power budget at the BS is P. Thus, the corresponding 
power constraint is || W Ul [k] || 2 + ||W U2 [/c]|| 2 < P. Notice that the precoding matrix [W Ul [k], W U2 [k]] 
seeks to facilitate the transmission of new packets to users u\ and U2 and thus must be designed based 
on the available coarse estimates (h Ul [k], h U2 [k]), since the corresponding finer estimates for that interval 
are not yet available to the scheduler. Accordingly, we assume that this precoding matrix can be obtained 



as the output of any arbitrary but fixed (time-invariant) mapping from C coarse x C coa rsc to (D A/ * x4 , when 
the coarse estimates (h Ul [k], h U2 [k]) are given as an input. Note that assuming the mapping to be fixed 
is well suited to systems where the so-called "precoded pilots" are not available so that the choice of 
precoders needs to be signalled to the scheduled users. A fixed mapping (which is equivalent to one 
codebook of transmit precoders) then allows for efficient signaling. 

2) Slot 2: In slot 2 of the interval, an interference resolving packet for a pending previous transmission 
involving users («i,«2)> sent in interval k < k, is transmitted. In particular, the transmitted signal vector 
over the M t antennas is 

z[k,2](h Ul [K]W U2 [K] 

S u 2 [ K ] ) ) 
v v ' 

x U2 [k] 

where z[k,2] € (p M * xl i s a precoding vector. Note that h Ul [k]x U2 [k] is a scalar, so the average power 
constraint 2)h Ul [k]x U2 [k] || 2 ] < P can also be written as ||z[fc,2]|| 2 h Ul [k] W U2 [k] < P. The 

received signals in slot 2 at both users are therefore 

y Ul [k, 2] = h Ul [k]z[k, 2} (h Ul [k]x U2 [k]) + n Ul [k, 2] (4) 
y U2 [k,2] = h U2 [k}z[k,2](h Ul [K]x U2 [K}) + n U2 [k, 2]. (5) 

3) Slot 3: In slot 3 of the interval, similarly, the transmitted signal is 

z[k,3](h U2 [K]W Ul [ 



2 

so that the power constraint is \\z[k, 3] || 2 h U2 [k] W Ul [k] 
users are therefore 



< P. The received signals in slot 3 at both 



y Ul [k, 3] = h Ul [k]z[k, 3] (h U2 [k]x Ui [k]) + n Ul [k, 3] (6) 
y U2 [k,3] = h U2 [k}z[k,3}(h U2 [K]x Ul [K}) +n U2 [k,3]. (7) 

Notice that the precoding vectors z[k, 2], z[k, 3] seek to facilitate the completion of a pending 
transmission to users u\ and U2 and thus must be designed based on the available coarse estimates 
(h Ul [k], h U2 [k]), as well as the available estimates for interval k which are (h Ul [k], h U2 [k]) and 
(h Ul [n\,h U2 [K\). Accordingly, we assume that these two vectors can be obtained as the output of an 
arbitrary but fixed mapping from C| nc x C^ oarse to (D M ' x2 . An example of mapping rules to obtain the 
precoding matrices and vectors is given later in the section on simulation results. 

Next, in order to compute the average rates (rewards) we assume that the channel state vectors 
h u . [k] , h u . [k] are known perfectly to user n, ,i € {1,2} (each user of course also knows the quantized 
estimates it has fed back to the base-station). In addition, user u\ (1L2) is also conveyed the finer estimate 
h U2 [K], (h Ul [K\) via feed-forward signaling before the start of interval k. For simplicity, the feedback and 



feedforward signaling overheads are ignored in this work. Then, by the end of slot 3, from ©, dD and 
©, at user u\, we have 

y Ul [k, 2] 



Vm [k, 1] 
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(8) 



where the additive noise variables n Ul [k, 1], n Ul [k, 2], n Ul [k, 3] are i.i.d. circularly symmetric complex 
Gaussian variables with zero-mean and unit variance, £A/"(0, 1). Notice that the interference term (h Ul [re] — 
h Ul [K])x U2 [K] is independent of the desired signal as well as the additive noise. Letting ^ r 1 ror M = 
h Ul [n] — /i Ul [re], the noise plus interference covariance for user u\, denoted by r ui [/c], is therefore 
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Define G Ul [k] = h Ul [k] W Ul [«]; 5 Ul [k]h U2 [k]W Ui [k] and note that G Ul [k] G (D 2x2 . Further, let 
i3" csl ((«i, it2 ), («, A;)) = [k], ^ M2 [«], h Ul [k], h U2 [k], h Ul [k], h U2 [k]} denote the set of channel state 
information at the scheduler for user pair u\,U2 over intervals k, k. Then, using ([8]) the instantaneous 
information rate, denoted as I Ul [k] is given by 

1 



/ Ul [fc] = -log I + T^[k}G Ul [k]Gl[k] 



(9) 



where the fraction 1/3 is to account for the fact that three slots are needed to obtain this rate. Then, 
(an optimistic value for) the average information rate that can be achieved via rateless coding (cf. J9]) is 
given by 

R^[k] = E [l Ul [k] | H csi ((u u u 2 ),(K,k))] . (10) 

A more conservative rate that is appropriate for conventional coding, denoted as ii™ nv [A;], is given by 

r i>Ul (l-Pr(l Ul [k]<re, Ul | H csi (( Ul ,u 2 ), (K,k)))) , (11) 

where rg iUl denotes the rate assigned (using any fixed mapping) to user u\ in 9 before transmission of 
new packets for the pair (141,142) in interval k, based on the available coarse estimates h Ul [«], h U2 [k]. 
The rates corresponding to (fTOl or ([TT1 can be derived in a similar manner for user u 2 - 

Note that in deriving the average rate in (flOl or (TTTt we have assumed a simple albeit sub-optimal 
filtering at the user to suppress the interference from the transmission intended for the co-scheduled 
user. For completeness, we provide the average rate expressions for the case when the user employs the 
optimal linear filter and for brevity we only consider the optimistic rate for user u\. Towards this end, 
we collect the observations received by user u\ as 
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For this model, we can determine the instantaneous information rate that can be achieved via optimal fil- 
tering using © but where where r ttl [k] = I + F Ul [k]W U2 [s] [k)fI [k] and G Ul [k] = F Ul [k]W Ul [«]. 
The average information rate can then determined as before using (fTOl) . 

We assume that either conventional coding is employed for all users or rateless coding is employed and 
accordingly let R Ui [k], 1 < i < 2 denote the average rate, henceforth referred to also as the service rate, 
obtained over interval k. We also note here that the scheduling scheme (policy) is preceded by an initial 
set-up phase comprising of N(N — l)/2 intervals in which new packets are transmitted successively to 
each user pair without any accompanying interference resolution packets. For notational convenience, we 
assume that the scheduling policy starts operating from interval with index using the initial set T[0] 
determined by the set-up phase. 



D. Incorporating one-shot transmissions and feedback delays 

We first consider the case of one-shot transmissions. To enable one-shot transmission of packets to any 
pair in any interval k, we define an action 9 in which u\ , U2 is the pair but k = 4> to capture the fact that the 
intended transmission is one-shot and hence does not seek to resolve any pending previous transmission. 
Then, in all three slots of that interval transmission is done as in conventional MU-MIMO relying only 
on the available current estimates H_[k]. In particular, a transmit precoder [w Ul [k], w U2 [k]] 6 (D M * x2 is 
formed based on {h Ul [k], h U2 [k]} using a technique such as zero- forcing |0. Defining /°^ e - shot [fc] = 
log (l + |h, Ul [A:]ty Ul [A;]| 2 /(l + |h, Ul [/s]iu U2 [fc]| 2 )), the corresponding average rates obtained for user u\ 
(similarly for user U2) are given by 
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In addition at the end of interval k, we simply set T[k + 1] = T[k] since no pending packets are completed 
or introduced. 



Recall that so far we have assumed that upon choosing action for interval k, the finer estimates 
h Ul [k] , h U2 [k] are available at the start of interval k + l (representing a unit delay). In practical systems 
there can be a delay of several intervals in obtaining such finer estimates. Assuming that these delays 
are fixed and known in advance, they can be accommodated by expanding the definition of a state. 
In particular, we can define 4— tuples such as (i,j,Kij,dij) where dij > measures the remaining 
delay after which finer estimates hi{nij], hj{nij] will be available. At any interval k selecting the action 
Kij , dij) with dij > (dij = 0) constrains the interference resolution to be based only on the coarse 
estimates /^[^j], hj[Kij], hi[k], hj[k] (on both coarse and fine estimates H csl ((i, j), (/% j , k))). Upon 
selecting this action the 4— tuple in T[k + 1] corresponding to the pair i, j is set to be (i, j, k, dij = Dij) 
where Dij is the maximum delay (starting from fe + 1) after which the finer estimates will be available. If 
that action is not selected, it is updated in T[k + 1] as (i,j, Kij,dij = max{0, dij — 1}). For convenience 
in exposition the aforementioned two extensions are not considered below. 

E. System State and Throughput Region 

Define the system state at the start of an interval j as S[j] = {T[j],H[j]} and let 9[j] denote the 
decision (action) taken in that interval. Then, at each interval k, a scheduling policy ijj takes as input 
all the history up-to interval k, comprising of states {5L7']}jL and all decisions {^[j] }^=o ' to output a 
decision 9[k]. Under a particular policy tp, the throughput of the n th user is denoted as 

J-i 

rt= Km - T J2 E i R M V ™< ( 13 ) 

J->oo J * — » 
t=0 

where BZ[t] = R n [t]l(n £ 9[t]) and the expectation is over the initial state and the evolution of the 
states and decisions in the subsequent intervals. Note that in (fT"3T ) for simplicity we have assumed that 
the limit exists for the selected policy. In case the limit does not exist, we can consider any sub-sequence 
for which the limit exists. Let * be the set of all policies. The throughput region that is of interest to 
us is defined as the closure of the convex hull of the throughput vectors achievable under all policies in 
i e., 

A = CH{r : 3ip € * s.t.,r = r^}, 

where CH{-} denotes closure of the convex hull. For each throughput vector r, we obtain a utility value 
U(r), where U(-) is the non-negative component-wise non-decreasing and concave utility function. For 
convenience, we also assume that the utility is continuous (and hence uniformly continuous) in the closed 
hypercube [0, b] N for each finite b € IR+. The objective then is to maximize the network utility within 
the throughput region, i.e., max r:re A U(r). 

III. Optimal frame-based scheduling policy 

In this section, we propose a frame based policy that achieves a utility arbitrarily close to the optimal. 
In this policy, the time intervals are further grouped into separate frames, where each frame consists of 



T consecutive intervals. The scheduling decisions in each frame are based on a set of virtual queues that 
guide the achieved system utility towards optimal, as specified next. 



A. Virtual Queue and Virtual Arrival Process 

To control the achieved utilities of different users, a virtual queue is maintained for each user, denoted 
as Q n [k],k = 0, 1, • • • $z n = 1, ■ ■ ■ , N. At the beginning of the r th frame comprising of intervals 
{rT, • • • , (r + 1)T — 1}, where r € {0, 1, 2, • • • }, the following optimization problem is solved at the 
scheduler 

N 

max V-U(r)-y2Q n [rT]r n , (14) 

~ ~ 71=1 

where r max , V are positive constants that can be freely chosen and whose role will be revealed later. We 
let r*[r] be the optimal solution to the above problem. Then, the virtual arrival rate for user n is set 
as r*[r] in each interval in the r th frame. A scheduling policy, i^q[ t t]' * s determined and implemented 
based on the virtual queue length Q[tT] obtained at the beginning of that frame. Letting Rn* QlTT] [k] 
denote the service rate of user n in each interval k in the r th frame under this policy, the virtual queue 
is then updated as 

Qn[k + 1] = [Q n [k] - Rn' QlTT] [k}) + + r* n [r], (15) 
for all tT < k < (r + 1)T — 1 and each user n and where = max{0, x} with Q n [0] = for all n. 

B. State-action frequency approach 

We now determine the policy ^q^] employed in the r th frame. Notice that while the definition of 
the system state adopted thus far allows us to compactly describe any policy, one associated drawback is 
that the number of states becomes countably infinite. Fortunately, there is one aspect that we can exploit. 
Note that the average rates obtained upon scheduling a pair of users i,j on any interval k depends 
only on the corresponding coarse and fine channel estimates in interval Kij (which we recall denotes 
the prior recent-most interval over which that pair was scheduled) and the coarse channel estimates in 
interval k but not on those interval indices. Then, to analyze the average rates offered by any policy, 
it suffices to define a finite set of states, 5, as follows. A state s € S is defined as a particular choice 

■ p,Bnc i p,finc , p, coarse > p, coarse • c, coarse »c, coarse c j £ t. i * c u ■ 

hf ,hj ,hf ,hj ,hj' of coarse and fine channel estimates for each pair 

where the superscripts p,c denote past and current estimates, respectively. Consequently there are 

jV(Af-l) 

|5| = (|C nne | 2 |C CO arse| 2 ) 2 (Ccoarsel^ number of states. Note that a state S[k] in the previous definition 
would map to state s 65 which has the choice hi[Kij], hj[nij], hi[Kij], hj[Ki,j], hi[k],hj[k] for each 
pair A finite set of actions, A, is defined next to be the collection of all possible user pairs so that 
any a € A uniquely identifies a user pair. Let P(sW,a) denote the transition probability, which we note 
can be determined using dD and the facts that the finer past estimates of pairs not in a do not change 
and the current coarse estimates are i.i.d. across intervals. Letting V_{A) define the set of all probability 



distributions on A, any policy can be denned as a mapping which at each interval k takes as input all the 
history up-to interval k, comprising of states {s[j]}| =0 and all actions {a[j]}jZo, to output a distribution 
in V_[A] from which the action a[k] can be generated. A stationary policy is one which at any interval k 
considers only the state s[k] to output a distribution in V_[A] and where the output distribution depends 
only on the state s[k] but not on the interval index k. Under any stationary policy the sequence {s[fc]}^L 
is a Markov Chain. 

With these definitions in hand, we let R n (s, a) denote the achieved transmission rate for user n when 
action a is taken and the system state is s. Denote the state action frequencies by {x(s, a)}s£S,a£A> where 
we note that each x(s,a) lies in the unit interval [0, 1] and represents the frequency that the system state 
is at s and action a is taken. The state action frequencies need to satisfy the normalization equation 

^2 x{s,a) = 1, 

and the balance equation 

a) = ^] P(s\s, a)x(f/, a). 

The above two equations form a state-action polytope X_ and let x denote any vector of state action 
frequencies lying in X_. We next define a rate region as 

A = {R : R n = ^2^2R n (s,a)x(s,a), Vn&xeX}. (16) 

s_ a 

Then, given the virtual queue length q = Q[tT] we consider the following linear program (LP), 

max q T R(s, a)x(s, a) 

X ' 

s,a 

s.t. xeX. (17) 

We use x* to denote an optimal solution to the linear program and define R* = [R[, ■ ■ ■ , R* N ] T , where 

s a 

Using the Bayesian rule, we can identify the corresponding stationary policy ^g^, which at any interval 
k in the T th frame first maps the state S[k] to its counterpart s£(S. Then, if ^ , x*(s, a') > 0, it chooses 
action a using the probabilistic rule 

X* ( S CL] 

P(pick a at state s) = =^= — -r-, V a £ A. 

Ea> X ~ ~ 

On the other hand, if J2 a > X *U^) = °> il chooses action a arbitrarily. Let R hmric [k] , tT < k < 
(r + 1)T — 1, denote the service rate vectors obtained under this policy for the intervals in the r th frame. 

We list the following results which can be obtained using those that have been derived before for 
weakly communicating Markov Decision Processes |[T0l . |[TTi . 



Lemma 1. The region A defined in f liJD is identical to the region A defined in U6i . Further, for each frame 
t and any given Q[tT], an optimal solution to the LP in di7| ) can be found for which the corresponding 
policy ^q[ t t] iJ a ^ so deterministic. 

Henceforth, we assume *Q[ T rj to be also deterministic. 

Lemma 2. For arbitrarily fixed 5 > there exists a large enough frame length T Q and constants 7, (3 
such that for each frame length T >T Q and all Q[tT] 

1 ^ fl fra m c [rT + j] J 



Pr 



< 7 exp(— (3T). 



(19) 



C. Optimality of the frame-based policy 

Define Lyapunov function L(Q[tT]) = \ J2n=i Qn[ T T]- Then the T-step average Lyapunov drift is 
expressed as 

A T (Q[rT]) = ±E [L(Q[(r + 1)T]) - L(Q[rT}) \ Q[rT}} , 

where the expectation is over the initial states at interval tT induced by the policies adopted in the 
previous frames and the evolution of the states and decisions in the r th frame under the policy $g, ™. 
Our first result is the following. 

Proposition 1. For any given e > 0, there exists a frame length T Q such that for all frame lengths T >T Q 
the T-step average Lyapunov drift can be bounded as 

N N 

At(Q[tT]) < BT -^QnlrT^ + ^QnlrTKir}, (20) 

n=l n=l 

where B is a constant and R = [R\, • • • , Rn] T is any vector such that R + el € A. 



Proof: Proved in Appendix [ 
Consider the e-interior of A, i.e., A e = {R : R + el G A}. Denote r° pt as the optimal value of the 
following optimization problem. 



max U(r) 
s.t. r £ A e ;r < r max l. 



Our main result is the following. 



Theorem 1. For any given e > 0, there exists a T Q such that for all frame lengths T >T Q 

J-i 



lim inf U 

J— >oo \ J 



t=0 



frame r 



[t] > C/(r° pt ) - BT/V. 



Proof: Proof Sketch in Appendix EO ■ 
Thus, by choosing e, framelength T and parameters V, r max appropriately, our frame based policy can 
be made arbitrarily close to optimal. 

For comparison we will use the conventional MU-MIMO scheduling described in Section III- A I In 
addition, we also use the following myopic policy. This policy operates in a manner similar to the frame 
based policy but with the following important differences. Firstly, the frame-length is set as T = 1 so that 
the arrival rates are computed at the start of each interval and the virtual queues are updated at the end of 
that interval. Then, at each interval k the current state S[k] is mapped to its image s 6 5. Considering the 
queue length q = Q[k], the action a = arg max ae _4 q T R(s, a) is selected. Clearly, this policy does not 
consider the transition probabilities (and the possible future evolutions) at all while deciding an action. 
Nevertheless, as seen in the following section, this policy indeed offers a competitive performance. 

IV. Simulation Results 

We consider a narrowband downlink with four single-antenna users that are served by a BS equipped 
with four transmit antennas. All users are assumed to experience an identical (large scale fading) pathloss 
factor 5 and thus see an identical average SNR, which models the physical scenario in which all users are 
equidistant from the BS. Further, we model the small-scale fading seen by each user as Rayleigh fading 
so the channel response vector of each user is assumed to have i.i.d. CJ\f(0, 5 2 ) elements. Consequently 
the normalized channel response vector (i.e., channel direction) is isotropically distributed in (D 4xl . 
Moreover, the channel response vectors evolve independently across intervals and are independent across 
users. In the following simulations, each user quantizes its channel norm and channel direction separately. 
In particular, the channel norm is quantized using a scalar quantizer which for simplicity we assume to 
be identical for both fine and coarse estimates. On the other hand, to quantize the channel direction, 
in order to obtain the finer estimate, the quantization codebook used comprises of a set independently 
generated instances of isotropic vectors in (D 4xl (a.k.a. random vector codebook), where we note that for 
large codebook sizes random vector codebooks have been shown to be a good choice for both SU-MIMO 
and conventional MU-MIMO. The quantization of the channel direction to obtain the coarser estimate is 
accomplished using Grasmannian codebooks. 

Before offering our results, we consider an interval k and decision 8 and describe the mapping rules 
alluded to in Section III-CI We determine a good direction (i.e., unit-norm beamforming vector) for 
multicasting using the alternating optimization based multicast beamforming design algorithm 0~2] that 
takes only the coarse estimates h Ul [k] and h U2 [k] as inputs and set rrffejlji and pj^gjT] to be equal to 
this direction. The precoding matrix W Ul [k] is obtained by extending the naive zero-forcing design of 
conventional MU-MIMO to the model in ([8]). In particular at interval k the BS naively assumes that 
coarse estimates h Ul [k] , h U2 [k] it has are indeed equal to their respective exact channels (and hence their 
respective finer estimates). Then, at any future interval k (the knowledge of k is not assumed during 



interval k) when pair {u\,U2) is next scheduled, under the naive assumption © would reduce to 

+ n Ul [h 
n Ul [k, 2] 



yui[M - h U2 [K]x Ul [K] + TT^j^L - (21) 



(/i Ul [fc]*[fc,3]) " 2L J " 1L J (/^ [*;];#, 3])' 
To remove dependence on A;, all noise covariances are averaged so that d2H reduces to a 
point-to-point MIMO channel with channel matrix [h Ul [k]; h U2 [k]] and noise covariance diag{l + 
E[l/\h Ul [k]z[k, 2] | 2 ], E[l/\h Ul [k]z[k, 3] | 2 }. Notice however that due to the power constraints these 
expected values in turn depend on the choice of precoders W Ul [«], W U2 [«]. As a further simplification, 
we fix these expected values to be suitable scalars which are determined offline. The precoder W Ul [k] can 
now be obtained using the standard point-to-point MIMO precoder design algorithm J7J. The precoder 
W U2 [k] is computed in an analogous manner. Finally, the norms of the precoding vectors are fixed as 

IMMII = nil r^X rill and NMII = lift r^X r nr 
||/» u1 [«]W U! ,[k]|| \\h U2 [k]W u1 [k]\\ 

In Fig. [2] we compare the sum rate utility obtained using conventional MU-MIMO that only uses the 
current CSI with that obtained using the myopic scheduling that uses only the delayed CSI (EMAT with 
delayed) and the myopic scheduling that uses the hybrid CSI (EMAT with hybrid), where for the latter 
two schemes the average rates are computed assuming both the sub-optimal and the optimal filtering. In 
all cases the channel norms were assumed to be perfectly quantized whereas a 2-bit coarse codebook 
and 5 -bit fine codebook were employed to quantize the channel directions, respectively. As seen from the 
figure, the conventional MU-MIMO gets interference limited and the policy using the finer albeit delayed 
CSI offers significant gains, which are further improved by utilizing the hybrid CSI. The improvement 
is more marked upon using optimal filtering. 

In Fig. [3] we consider the same setup as in the previous figure but now compare the sum rate utility 
obtained using the myopic scheduling that uses the hybrid CSI along with the optimal filtering, for 
different codebook sizes. In particular, in all cases the channel norms were assumed to be perfectly 
quantized and a 2-bit coarse codebook was employed. Four different codebook sizes (5, 10, 12, and 16 
bits) for the fine codebook were employed and compared along with the case when perfect delayed CSI is 
available to the BS. As seen from the figure, to capture the promised multiplexing gains the codebook sizes 
must scale sufficiently fast with SNR. We note here that the MAT and EMAT schemes have been designed 
with the goal of achieving degree of freedom improvements, where aligning (confining) interference to a 
low dimensional subspace is the paramount concern. The substantial gap compared to the perfect delayed 
CSI performance observed at a fixed (finite) SNR can be alleviated via proper precoder design that is 
optimized for a finite SNR. We emphasize that the precoder optimization we undertook to produce these 
set of results were limited and adhered fully to the EMAT framework. 

We also compared the sum rates obtained using our proposed policy and the myopic one, respectively, 
for a simpler examples having fewer number of states. We found that for well designed quantization 



codebooks, the myopic policy performs very close to the optimal frame based policy. This observation 
coupled with the fact that the complexity of the myopic policy scales much more benignly with the 
system size, makes it well suited to practical implementation. 

V. Conclusions 

We considered the DL MU-MIMO scheduling problem with hybrid CSIT and proposed an optimal 
frame-based joint scheduling and feedback approach. There are two important and interesting issues that 
are the focus of our current research. The foremost one pertains to the exceedingly large number of 
states that are needed to accommodate practical system sizes which makes implementation of the frame 
based policy challenging even upon using commercial LP solvers. While the sparse nature of these linear 
programs can indeed be exploited, an efficient and significant reduction in the number states is necessary. 
The second issue is the choice of the precoding matrices and vectors. Recall that in this work we have 
assumed the choice of precoders to be pre-determined and fixed for each (state,action) pair. To fully 
exploit the precoding gains and the availability of "precoded pilots" in future networks, we should relax 
this restriction. Finally, we remark that incorporating practical considerations such as delay constraints 
on scheduling are other important open issues. 
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A. Extended MAT scheme 

The MAT scheme is an interesting tool that has been recently proposed to tackle the problem where 
no channel state estimates for the current interval are available at the BS but perfect albeit delayed CSI 
is available to the BS. The scheme uses such completely outdated CSIT but still achieves system degrees 
of freedom equal to 4/3. We recall that in our context MU-MIMO with perfect and current CSIT will 
achieve 2 system degrees of freedom while single-user transmission will achieve only one degree of 
freedom. In this paper, we will build upon the following extended MAT scheme [6] that achieves the 
same system degrees of freedom as the original MAT scheme. 

The scheme proceeds as follows. Time is divided into units referred to as rounds. Two messages u and 
v are to be transmitted, each destined to users i and j respectively, where u and v are M t x 1 vectors. 
The three rounds are introduced next. 

• Round 1: The transmitted signal is x[l] = u + v, the corresponding received signal at user i and j 
is denoted by y,;[l] and yj[l], where 

y i {l] = h i [l](u + v)+n i [l), (22) 
y j [l]=h j [l](u + v)+n j [l]. (23) 

where njl] denotes the additive noise at user i in round 1 and hi[l] G (Tj lxM * denotes the channel 
response vector seen by user i in Round 1. 

• Round 2: The transmitted signal is x[2] = [hi[l]v;0], the received signal for user i and j is 
respectively 

Vi [2] =hi tl [2]-(hi[l]v) + m[2], (24) 

y 3 [2] =h hl [2]-{h i [l]v)+n 3 [2l (25) 

where ^j,i[2] denotes the channel coefficient modeling the propagation environment between user i and 
the first transmit antenna at the BS during round 2. 

•Round 3: The transmitted signal is x% = [hj[l]u;0], the received signal for user i and j is 
respectively 

yi[3]=/H )1 [3]-(fc i [l]ti)+n i [3], (26) 
I/ i [3]=^,i[3].(h i [l]ti)+n i [3]. (27) 

It is assumed that the channel state vectors hi[l], hi[2], hi[3] are estimated perfectly by user i at the 
start of each respective round. Similarly for user j. In addition, the BS is assumed to know channel state 
vectors hi[£],hj[£] perfectly but only after round I for I = 1,2,3. Further, user i is also conveyed the 
channel vector hj[l] and user j is conveyed the channel vector hi[l] before the start of round 3, via 
feed-forward signaling. 



Ql[(r + 1)T - 1] < ( Q n [rT] - J2 R^ mc [rT + j] + (Tr^r]) 2 + 2Tr* n {r] Qn[rT] - E R^tT + j] ) (28) 

3=0 / V j=0 



/T-1 \ A /T-1 

Ql[(r + 1)T] - (Q n [rT]) 2 < E R^[rT + j] + (Tr* n [r]f - 2Q n [rT] E R { ™™[tT + j] - Tr* n [r] ) . (29) 

\j=o / \j=o 



Therefore, after Round 3, the i th user can decode message u using d22l ). (f24l) and d26l ) as per the 
following, 

~ I — foT = h i\ l \ u + n i\ l \ 



M 2 ] ""'^ ' hi,i[2]' 
Hi [3] = [3] • hj [l]u + n» [3] . 

Similarly, after Round 3, the j th user can also decode message v. Notice that since the effective 
received observations seen by each user after three rounds can be modeled as the outputs of two linearly 
independent equations, each user can achieve two degrees of freedom over three rounds to attain system 
degrees of freedom equal to 4/3. 



B. Proof of Proposition [7] 

To bound the Lyapunov drift we proceed along the lines of 0T} and first note that 

Qn[(r + 1)T] < (q^tT] - £ R^tT + j] j + Tr* n [r], 



so that (1281 ) holds, which then yields the bound in d29| ). Using d29l ) we can bound the T— step Lyapunov 
drift as in (|30l ). Then, since i2,^ amc [j], V n, j can be bounded above by a constant and r* [r] < r max , V n, r, 
we obtain the bound 



N 



AT{Q[rT])<BT + J2Qn[rT]r* n [r) 



n=l 



-E 



N T-1 



n=l 



J=0 



Q[rT] 



(31) 



A t (Q[tT]) < —E 



n = l \ j=0 



Q[rT] 



T 



JV /t-i 

E 0"[ rT l E < mmc [^ + J] - ^[r 

n=l \j=0 



Q[rT] 



(30) 



where B is an appropriate large enough constant. The RHS in (Bil l can be manipulated to obtain 

N N 

A t (Q[tT}) <BT + Y, Qn[rTK[r] - £ Q n [rT]R* n 



71=1 



n=l 



Q[rT] 



where i?* = [R*, ■ ■ ■ ,R* N ] T was defined in (1T81 ). Using the Cauchy-Schwartz inequality along with the 
fact that Y^=i Qn[rT] > \J J2n=i QU tT ^ we can then further upper bound 

N N 



A T (Q[rT]) <BT+Y1 Qn[rTK[r] - £ Q„[rT]<+ 



n=l 



n=l 



A 



(J2Qn[rT])E 



71=1 



T-1 



,j=0 



(32) 



Invoking Lemma |2] along with the fact that R* is also bounded above, we can deduce that by choosing 
a large enough frame length we can ensure that 

1 (^i^a m e [Tr+j] ] 



7=0 



Q[tT] 



< e 



(33) 



which when used in (1321 yields 

At(Q[tT]) <BT+Y^ Qn[rT]r* n [T] - £ Q„[rT]K+ 



A? 



AT 



71=1 



n=l 
A 



e^Q n [rT]. 



(34) 



n=l 



Recall that any vector R in the e— interior of A satisfies R ^ R — el for some R G A. Then, appealing 
to the fact that Yl n =i Qn\ T T]Rn is the optimal solution for the LP in (fTTT ) together with Lemma [T] we 
have that 

N N 

A T (Q[rT}) < BT + Y J Qn[rT\r* n {T\ - ^Q n [rr](i? n - e), 



n=l 



71=1 



from which (1201) follows. 



C. Proof Sketch of Theorem [7] 

We leverage some of the techniques used in 0] but we emphasize that the policies considered in AH 
were not frame based and Markov decision processes were not employed there. Using the result in (1201 
(after assuming a large enough framelength) and subtracting the term VU(r*[r]) from both sides, we 
first obtain 



A? 



A 



A t (Q[tT]) - VU{r*[r\) < BT - V Q n [rT]R n + V Q n [rT]r* n [r] - VU{r*[r\). (35) 



71=1 



n=l 



Then recalling that r* [r] is the optimal solution to (fT4b we have that for any v : H i> H r max l 



JV 



A T (Q[rT]) - W(r*[r]) < ST - ^ Q n [T T]R n + 



n=l 

JV 



Y,Qn[rT]v n -VU{v). (36) 



n=l 

Averaging both sides of (l36l ) with respect to Q[rT], we obtain 

^[L(Q[(r + 1)T])] - ±E[L(Q[tT})} - VE[U(r*[r})) < BT 

N N 

- J2 E[Q n [rT\]R n + £ £[Q n [rT]K - (37) 

n=l n=l 

Noting that Q n [0] = 0, V n and summing (1371 ) over r = 0, 1, • • • , t — 1 we get 

t-i 



^[L(Q[tT])] -^7E[C/(r*[r])] < BTt 



T=0 

AT t-1 N t-1 

EE E iQn[rT]}R n + E E E [Qn[rT]]v n - tVU(v 

n=l r=0 n=l t=0 

which when combined with the fact that ^E[L(Q[tT])] > yields 
JV t-1 -, t-1 



7 EE^^^IK^ -v n )<BT + \Y. VE[U{r*[r])] 

VU{v). (38) 



^ / j / j ~ L » '** L — J J \ '» to / — — 1 ^ 

n=l r=0 r=0 



Next, choosing any i? G A e and t> : ^ u = i? — 51 and t> < r max l for some <5 > 0, and substituting 
in (I38T ). we get that 

1 JV t-i t-i 

- E ^[^[^]] < sr + 7 E Fi ^( r * w)i - 



n=l t=0 r=0 



which using the componentwise non-increasing property of the utility function yields 

JV t-i 

- EE 5E iQn[rT}} <BT + VU(r max l) - VU(v), V t. (39) 

n =\ r =0 

Then, since Qti[tT + j] < Qti[tT] + jr max , V n,j and C/(u) > i? > — oo for some constant #, 
from (|39l we can conclude that j J2n=i S/=o ^[Qnbl] * s a l so bounded above by a constant for 
all J, which proves that all virtual queues are strongly stable under the frame based policy. Letting 
A n [rT + j] = r* [r], V < j < T — 1, r = 0, 1, • • • , denote the per-slot virtual arrival rate, from strong 
stability of each virtual queue, uniformly bounded arrival rates and uniform continuity of the utility 



function, we can deduce that 



3=0 



lim inf U -V E r.R framc [j] 
J->oo I J L 

V i=o 



(40) 



Finally, setting R = v = r opt in (l38l ). we obtain 



- J] VE[?7(r*[T])] > FC/(r° pt ) - BT, 



(41) 



T = 



which upon invoking the concavity of the utility function and the linearity of the expectation operator 
yields 

U f 7 E E [ r *M]) ^ C/ ( r ° Pt ) " 5T/V. (42) 



r=0 



Notice then that due to the uniform continuity of the utility function, lim inf t-^oo U ( j ^2 T=0 E [r* [r] 
is equal to 



3=0 



J-l 



lim inf E7 l\^E[A\j}} 



3=0 



which when used in (1421 yields 



lim inf U\\*y E \A\j\] | > U(r° e pt ) - BT/V. 

J->oo \ J i£ — ' I 



j=0 



Using (|44l and (l40l yields the desired result. 



(43) 



(44) 
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Fig. 2. Comparison with conventional MU-MIMO 
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Fig. 3. Comparison for different codebook sizes 



